RNA-Seq Data Analysis ◾ 175
denominator, we will obtain the reads per kilobase per million or RPKM. The RPKM is
for single-end reads. However, for paired-end reads, both forward and reverse reads are
aligned, and thus, “fragment” is used instead of “read” and the normalized unit of gene
expression in this case is FPKM (fragment per kilobase per million).
5.3.5.2 Transcripts per Million
The transcripts per million (TPM) [25] is proposed as an alternative to RPKM and FPKM
to adjust for the bias of gene length and to be used for within-sample differential gene
expression. The TPM represents the abundance of reads aligned to gene i in relation to
the abundance of the reads aligned to other genes in the same sample. To normalize the
RNA reads counts, first, for any gene, divide the number of reads aligned to it by its length,
forming the count per gene length in base. Then, divide the count per length in base by the
sum of all counts (per length in base of every gene) and multiply by 1000,000 forming the
transcript per million or TPM.
k
l
k
l
i
i
i
j
j
j
∑
=
×
×
TPM
1
106
(5.2)
5.3.5.3 Counts per Million Mapped Reads
The counts per million (CPM) mapped reads normalize the number of reads that map to a
particular gene after correcting for sequencing depth and transcriptome composition bias
[26]. CPM is used for between-sample differential analysis to compare between the gene
expressions of the same gene in different samples. It is not suitable for within-sample gene
expression comparison because it does not adjust for the gene length. The CPM of a gene is
defined as the number of reads mapped to a gene divided by the total number of mapped
read (or library size) multiplied by 1000,000.
k
N
i
i
=
×
RPM
106
(5.3)
5.3.5.4 Trimmed Mean of M-values
The Trimmed Mean of M-values (TMM) [27] is used by edgeR for between-sample dif-
ferential gene expression. It uses the relative gene expression of two samples: one is the
sample of interest (treated), and the other is the reference that we wish to use as a baseline
for comparison. The gene-wise log-fold change of gene g Mg is given as:
M
Y
K
Y
K
g
gj
j
gr
r
=
log2
(5.4)